Unsupervised Induction of Modern Standard Arabic Verb Classes Using Syntactic Frames and LSA
نویسندگان
چکیده
We exploit the resources in the Arabic Treebank (ATB) and Arabic Gigaword (AG) to determine the best features for the novel task of automatically creating lexical semantic verb classes for Modern Standard Arabic (MSA). The verbs are classified into groups that share semantic elements of meaning as they exhibit similar syntactic behavior. The results of the clustering experiments are compared with a gold standard set of classes, which is approximated by using the noisy English translations provided in the ATB to create Levin-like classes for MSA. The quality of the clusters is found to be sensitive to the inclusion of syntactic frames, LSA vectors, morphological pattern, and subject animacy. The best set of parameters yields an Fβ=1 score of 0.456, compared to a random baseline of an Fβ=1 score of 0.205.
منابع مشابه
Unsupervised Induction of Modern Standard Arabic Verb Classes
We exploit the resources in the Arabic Treebank (ATB) for the novel task of automatically creating lexical semantic verb classes for Modern Standard Arabic (MSA). Verbs are clustered into groups that share semantic elements of meaning as they exhibit similar syntactic behavior. The results of the clustering experiments are compared with a gold standard set of classes, which is approximated by u...
متن کاملUnsupervised Induction of Modern Standard Arabic Verb Classes and Alternations
Verbs (in lemma form) and syntactic frames are automatically extracted from the ATB.! In order to acquire an argument structure for the verbs,!I only considered structure that is internal to the maximal Verb Phrase (VP) projection of the verb. However, within the VP, all sisters of the verb are excluded except for those in a close semantic relationship to the verb.! This is facilitated by the f...
متن کاملA Step-wise Usage-based Method for Inducing Polysemy-aware Verb Classes
We present an unsupervised method for inducing verb classes from verb uses in gigaword corpora. Our method consists of two clustering steps: verb-specific semantic frames are first induced by clustering verb uses in a corpus and then verb classes are induced by clustering these frames. By taking this step-wise approach, we can not only generate verb classes based on a massive amount of verb use...
متن کاملInducing German Semantic Verb Classes from Purely Syntactic Subcategorisation Information
The paper describes the application of kMeans, a standard clustering technique, to the task of inducing semantic classes for German verbs. Using probability distributions over verb subcategorisation frames, we obtained an intuitively plausible clustering of 57 verbs into 14 classes. The automatic clustering was evaluated against independently motivated, handconstructed semantic verb classes. A ...
متن کاملA Large Coverage Verb Taxonomy for Arabic
In this article I present a lexicon for Arabic verbs which exploits Levin’s verb-classes (Levin, 1993) and the basic development procedure used by (Schuler, 2005). The verb lexicon in its current state has 173 classes which contain 4392 verbs and 498 frames providing information about verb root, the deverbal form of the verb, the participle, thematic roles, subcategorisation frames and syntacti...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006